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ABSTRACT 

A key step in proliferation of retroviruses is the 
conversion of their RNA genome to double-stranded 
DNA, a process catalysed by multifunctional reverse 
transcriptases (RTs). Dimeric and monomeric RTs 
have been described, the latter exemplified by 
the enzyme of Moloney murine leukaemia virus. 
However, structural information is lacking that 
describes the substrate binding mechanism for a 
monomeric RT. We report here the first crystal struc- 
ture of a complex between an RNA/DNA hybrid sub- 
strate and polymerase-connection fragment of the 
single-subunit RT from xenotropic murine leukaemia 
virus-related virus, a close relative of Moloney 
murine leukaemia virus. A comparison with p66/ 
p51 human immunodeficiency virus-1 RT shows 
that substrate binding around the polymerase 
active site is conserved but differs in the thumb 
and connection subdomains. Small-angle X-ray 
scattering was used to model full-length xenotropic 
murine leukaemia virus-related virus RT, 
demonstrating that its mobile RNase H domain 
becomes ordered in the presence of a substrate— a 
key difference between monomeric and dimeric RTs. 

INTRODUCTION 

To proliferate, retroviruses must integrate their genetic 
information into the genome of the infected cell. As the 
retroviral genome is encoded in single-stranded RNA, it is 
converted to double-stranded DNA (dsDNA) through a 



multi-step process (1) using the RNA- and 
DNA-dependent DNA polymerase and ribonuclease H 
(RNase H) activities of the viral reverse transcriptase 
(RT). Reverse transcription initiates from host-derived 
tRNA hybridized to the primer binding site near the 
5'-end of the viral genome and proceeds until RT 
reaches the extreme 5' terminus of the genome, thereby 
creating (— ) strand strong-stop DNA. RNase H activity 
degrades the RNA strand of the resulting RNA/DNA 
hybrid, liberating the nascent strand of (— ) DNA and 
allowing it to hybridize with the 3' end of the genome 
through a process designated (— ) strand transfer. As (— ) 
DNA synthesis resumes, RNase H activity continues to 
degrade the RNA strand in the resulting RNA/DNA 
hybrid, with the exception of one or two short polypurine 
tracts (PPTs) that prime synthesis of (+) strand 
strong-stop DNA. After a second strand transfer event 
and release of the tRNA and PPT primers, bidirectional 
DNA synthesis produces the integration-competent 
double-stranded viral DNA. 

The N-terminal DNA polymerase domains of RTs 
resemble other nucleic acid polymerases and have been 
likened to a right hand with subdomains designated 
fingers, palm and thumb (2). A fourth subdomain, the 
connection, links the DNA polymerase and C-terminal 
RNase H domains. Both dimeric and monomeric retro- 
viral RTs have been described, the former exemplified by 
the human immunodeficiency virus-1 (HIV-1) enzyme. 
HIV-1 RT is an asymmetric heterodimer of 66 and 
51 kDa subunits (p66 and p51) that are proteolytically 
cleaved from the gag-pol precursor during virus matur- 
ation (3). The p66 subunit contains the DNA polymerase 
and RNase H domains, whereas p51 lacks an RNase H 
domain, has an altered conformation relative to the 
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equivalent segment of its p66 counterpart and provides a 
structural platform that serves to support and activate the 
larger subunit. A similar subunit organization has been 
described for the RTs of related lentiviruses (4,5). The 
best-characterized monomelic RT is the ~75kDa 
enzyme from the gammaretrovirus Moloney murine 
leukaemia virus (Mo-MLV) (6-8). One important conse- 
quence of the dimeric versus monomeric architecture of 
RTs is the placement of their RNase H domain. The 
domain of HIV-1 RT is rigidly placed on the p51 
subunit platform (2), whereas the Mo-MLV RNase H 
counterpart is connected to the rest of the enzyme via a 
flexible linker and assumed to be mobile (9). 

HIV-1 RT has been extensively characterized structur- 
ally and to this point is the only RT for which structures of 
complexes with productively bound substrates are avail- 
able. The structures determined for HIV-1 RT include its 
complexes with (i) dsDNA (10); (if) dsDNA and the 
incoming nucleotide (11); and (hi) an RNA/DNA hybrid 
in which the RNA strand contains the sequence of the 
HIV-1 3' PPT and flanking regions (12). Mo-MLV RT 
is the only monomeric RT for which structural informa- 
tion is available. Several structures of an N-terminal 
fragment comprising the fingers and palm subdomains 
have been reported, including structures containing 
dsDNA (13-15). In these structures, however, the duplex 
failed to contact critical active site residues of the DNA 
polymerase domain, and its position differed significantly 
from that in substrate complexes of HIV-1 RT and related 
DNA polymerases. These discrepancies were reconciled by 
the notion that such ternary complexes reflected an inter- 
mediate translocation state (14). The full-length Mo-MLV 
RT has also been crystallized, but only the DNA polymer- 
ase and connection subdomain were defined in the corres- 
ponding structure, and the RNase H domain was 
disordered (9). 

RNase H activity is essential for retrovirus replication 
(16) and is responsible for several critical steps of proviral 
DNA synthesis, including DNA strand transfer and gen- 
eration and specific removal of the tRNA and PPT 
primers. In contrast to HIV-1 RNase H, the isolated 
Mo-MLV domain retains activity but lacks specificity 
for some important intermediates in reverse transcription 
(17,18). The Mo-MLV RNase H domain contains a char- 
acteristic element designated the 'basic protrusion', which 
is absent from the HIV-1 enzyme. This motif is important 
for substrate binding and comprises a short helix and 
loop, which together form a bulge on the protein surface 
(19). Deleting the basic protrusion in the RNase H domain 
of Mo-MLV RT does not inhibit RNase H activity but 
blocks virus infectivity (20). In the structures of two 
gammaretroviral RNases H that were initially reported, 
the basic protrusion was removed to obtain crystals that 
diffracted X-rays to high resolution (21,22). Recently, a 
structure of the intact RNase H domain of the xenotropic 
murine leukaemia virus-related virus (XMRV), a close 
relative of Mo-MLV, has been determined (23). 

XMRV was originally proposed as the aetiological 
agent of prostate cancer (24) and chronic fatigue 
syndrome (25,26), but subsequent studies have unequivo- 
cally dismissed this notion, showing that XMRV increases 



through recombination following passaging human 
tumours in mice (27,28). Nevertheless, XMRV, a close 
relative to Mo-MLV, remains a replication-competent 
gammaretrovirus capable of infecting human cells. 

Existing Mo-MLV RT structures provide only limited 
and fragmentary knowledge about the mechanism of 
action of monomeric RTs, and no structures of 
Mo-MLV RT in a complex with productively bound 
nucleic acid are available. Therefore, our aim was to 
solve a crystal structure of a monomeric RT in complex 
with an RNA/DNA hybrid. We elected to work on the 
enzyme from the VP62 isolate of XMRV, which, 
excluding an unstructured N-terminus, differs in the 
sequence of the polymerase and connection domains in 
only five positions from that of Mo-MLV RT (P30L, 
L234Q, Q238R, D422N, L463M); hence, the two 
enzymes can be considered essentially identical. Here, we 
report the first crystal structure of a complex between the 
polymerase-connection region of a monomeric gammare- 
troviral RT and its substrate together with biochemical 
data that provide insights in the mechanism of substrate 
binding. We also used our co-crystal structure for a com- 
prehensive comparison with HIV-1 RT to elucidate struc- 
tural and mechanistic similarities and differences between 
these enzymes. Lastly, we present small-angle X-ray scat- 
tering (SAXS) data for the full-length enzyme that, 
together with modelling, provide insights about the 
mobility and arrangement of the RNase H domain in 
the context of a full-length XMRV RT monomer. 

MATERIALS AND METHODS 

Crystallization 

Protein expression and purification is described in 
Supplementary Information. Briefly, RT from XMRV 
isolate VP62 was expressed in Escherichia coli strain 
BL21 (DE3) Magic and purified on Nickel, ion exchange 
and size exclusion columns. HPLC-purified RNA and 
DNA oligonucleotides were purchased from Metabion 
International AG. The lengths of oligonucleotides used 
for crystallization were based on previous DNase I foot- 
printing data (29). Before crystallization, protein was 
mixed with DNA/RNA hybrid in a 1:1.2 molar ratio 
and a final protein concentration of 5 mg/ml. The DNA/ 
RNA hybrid (hybrid 1) was produced by annealing an 
RNA oligonucleotide (5'-AACAGAGUGCGACACCU 
GAUUCCAU-3') and a DNA oligonucleotide (5'-TGG 
AATCAGGTGTCGCACTCTG-3'). The resulting 
hybrid had a 22 bp duplex region and overhangs of the 
RNA strand: 2 nt overhang at the 5' -end of the RNA and 
1 nt overhang at the 3' end. Crystals of the nucleoprotein 
complex were obtained at room temperature by 
hanging-drop vapour diffusion. Initial crystallization con- 
ditions were identified using the INDEX screen from 
Hampton Research. Following optimization, the best 
crystals were obtained by mixing 1 ul of protein-DNA/ 
RNA complex with 1 ul of reservoir buffer containing 
0.2 M ammonium sulfate, 100 mM BisTris (pH 5.0) and 
17% PEG3350 and addition of 0.2 ul 20% w/v 
benzamidine to the drop. Before data collection, crystals 
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were cryoprotected by step-wise addition of 50% glycerol 
to the crystallization drop to a final concentration of 25% 
and flash frozen in liquid N 2 . The content of the crystals 
was analysed by polyacrylamide gel electrophoresis 
(Sodium dodecyl sulphate-polyacrylamide gel electro- 
phoresis for protein and Tris-borate-EDTA-urea gel for 
nucleic acid). 

Diffraction data collection, structure solution 
and refinement 

X-ray diffraction data were collected at 14.1 beam line 
of Berliner Elektronenspeicherring-Gesellschaft fur 
Synchrotronstrahlung (BESSY) (30) for selenomethionine 
crystals (at Se peak wavelength of 0.979 A) and native 
data at the European Synchrotron Radiation Facility 
(ESRF) at 23-2 beamline on a Mar225 CCD detector at 
100 K. Diffraction data were processed and scaled with 
HKL2000 (31). The statistics of diffraction data is 
summarized in Table 1. The structure was solved by mo- 
lecular replacement, using the Mo-MLV RT structure as 
the search model (Protein Data Bank ID: 1RW3) (9) and 
PHASER program (32). Iterative building with COOT 
(33) was carried out, and refinement with Phenix (34) 
was monitored throughout using R-free, calculated with 
5% of unique reflections. In the final model, 99.5% of the 
residues are within the allowed regions of the 
Ramachandran plot. 

In the DNA polymerase domain and connection 
subdomain, several regions could not be traced due to 
the lack of interpretable electron density: the extreme N 
terminus (the His-tag and protein residues 1-27), two 
loops in fingers domain (residues 104-107 and 175-181), 



Table 1. Data collection and refinement statistics of XMRV 
RT- RNA/DNA complex crystals 



Data collection 


Native 


SeMet (two crystals) 


Space group 


/>4 3 2,2 


/ , 4 3 2,2 


Cell dimensions 






a, b, c (A) 


98.1, 98.1, 201.8 


97.9, 97.9, 201.3 


<x, P, T(°) . 


90, 90, 90 


90, 90, 90 


Resolution (A) 


30-3.04 (3.09-3.04)* 


50-3.4 (3.46-3.40) 


^merge 


9.9 (96.4) 


16.0 (53.9) 


I/al 


26.8 (2.7) 


16.5 (1.6) 


Completeness (%) 


100 (100) 


94.5 (65.9) 


Redundancy 


13.0 (13.3) 


13.4 (6.5) 


Refinement 






Resolution (A) 


3.04 




Number of reflections 


19172 






22.4/28.0 




Number of atoms 


4071 




Protein 


3413 




Ligand/ion 


621 




Water 


37 




B-factors (A 2 ) 


75.3 




Protein 


69.3 




Ligand/ion 


109.6 




Water 


52.4 




R.m.s. deviations 






Bond lengths (A) 


0.011 




Bond angles (°) 


1.01 





*Values in parentheses are for the highest-resolution shell. 



two residues from thumb subdomain (330 and 331) and a 
fragment of the connection domain (449-454). The last 
residue of the connection domain that could be traced in 
our structure is Pro487. 

The composite simulated annealing omit maps were 
calculated with Crystallography & NMR System (CNS) 
1.3 (35) using the default parameters with 5% of the model 
omitted at each step. Anomalous difference maps for 
selenomethionine data set were calculated both in CNS 
1.3 and in Phenix giving essentially the same results. 
Structural analyses, including superpositions and second- 
ary structure assignments, were performed in Pymol 
(http://www.pymol.org). The same software was used to 
prepare the structural figures. Nucleic acid geometry was 
analysed by Curves+ (36). The structure was deposited in 
the PDB under the accession code 4HKQ. 

Biochemical studies of XMRV RT variants 

RNA-dependent DNA polymerase and RNase H 
activities were simultaneously evaluated via the ability to 
support DNA strand transfer (37). DNA synthesis was 
initiated by adding 1 p.L of RT (150ng) to 9uL of 
mixture containing 50 nM donor Cy5-RNA template/ 
Cy3-DNA primer, 250 nM acceptor RNA template and 
200 uM dNTPs in 10 mM Tris-HCl, (pH 7.8), 9mM 
MgCl 2 , 80 mM NaCl, 5mM dithiothreitol. Samples were 
incubated at 37°C for 5, 10, 20 and 40min, then quenched 
with equal volume of 8 M urea in TBE buffer. 
Polymerization and hydrolysis products were resolved by 
high voltage, denaturing polyacrylamide gel electrophor- 
esis and visualized by fluorescent imaging (Typhoon 
Trio+, GE Healthcare). 

SAXS data analysis 

Synchrotron SAXS data were collected on the X33 
beamline at EMBL (DESY, Hamburg, Germany) (38). 
Protein buffer contained 10 mM HEPES (pH 6.5), 5% 
glycerol, lOOmM KC1 and 1 mM DTT. Samples were 
prepared for XMRV RT alone and for a mixture of the 
protein with DNA/RNA hybrids (hybrid 1, PPT- 18 and 
PPT- 19) at a 1:1.2 molar ratio and a final protein concen- 
tration of 0.9 or 1.8mg/ml. The PPT substrates had fol- 
lowing sequences: PPT- 19 - RNA strand: 5'-UAGUCUC 
CAGAAAAAGGGGGGAAUG-3', DNA strand: 5'-AT 
TCCCCCCTTTTTCTGGAGAC-3' . PPT- 18: RNA: 5'-A 
GUCUCCAGAAAAAGGGGGGAAUGA-3' and DNA 
3'-CATTCCCCCCTTTTTCTGGAGA- 5' . 

Pilatus one-megapixel array detector (Dectris, 
Switzerland) was used to record 15 s exposures. The 
sample-to-detector distance was set to 2.7 m and covered 
a range of momentum transfer 0.08 nm" 1 < s<6.0nm _1 , 
(s = 4 7i sin 0/A, where 20 is the scattering angle and 
X = 0.15 nm is the X-ray wavelength used in measure- 
ments). No measurable radiation damage was detected 
by comparison of eight successive time frames with 15 s 
exposures. 

All SAXS data manipulations were performed with the 
PRIMUS software suite (39). Radius of gyration i? ? and 
forward scattering / (0) were calculated using Guinier 
approximation with PRIMUS's AutoRg program. Based 
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on p (r) distribution obtained from GNOM (40), 
maximum diameter values D max were calculated for each 
sample. Molecular weights estimates were calculated 
based on bovine serum albumin scattering profile used 
as a standard. A low resolution envelope was determined 
from the scattering data for XMRV RT - PPT-19 
complex. Thirty independent ab initio reconstructions 
were obtained with DAMMIF (41) and averaged with 
DAMAVER (42). The normal spatial discrepancy param- 
eter for PPT-19 was in the range of 0.72-0.96, which in- 
dicates high similarity of reconstructions and that unique 
solution was identified. This reconstruction should, 
however, be treated with caution, as it was generated 
with DAMMIF, which cannot handle multiphase 
(protein/nucleic acid) scattering data. Applying MONSA 

(43) , which is dedicated to process multiphase data, was 
not possible for XMRV RT owing to differences between 
its apo and complex conformations. The final reconstruc- 
tion was superimposed onto the atomic model of the 
complex (with PPT-19 substrate) with SUPCOMB20 

(44) , taking into account the enantiomers. 
Inter-domain flexibility of the apo protein was initially 

assessed with basic parameters inferred from a scattering 
profile (R g , D max ), Kratky plot and consecutively explored 
with the Ensemble Optimization Method (EOM) (45). 
EOM consists of two separate programs: Random Chain 
(RANCH) and Genetic Algorithm Judging Optimization 
of Ensemble (GAJOE). RANCH generates a pool of 
random models with the target sequence while preserving 
structural fragments provided by the user. GAJOE selects 
the optimal ensemble of models with combined scattering 
intensities best-matching experimental data. Thus, EOM 
allows the coexistence of a number of conformations in 
solution, providing goodness of fit measure (x) and R g 
distribution for the selected ensemble. 

To perform sampling of RNase H domain position in 
the vicinity of the RNA/DNA hybrid, an initial pool of 
decoys was generated using the REFINER program (46). 
Polymerase and RNase H domains were treated as rigid 
bodies, whereas the inter-domain linker covered a range of 
conformations. Additionally, an N-terminal fragment and 
4 terminal residues at the C-termini were modelled. A final 
non-redundant set of 29 733 decoys was selected by clus- 
tering with the Cot root mean square deviation threshold 
of 3 A. Each decoy from this set was complemented with 
the atomic model of the RNA/DNA hybrid. The config- 
urations clashing with RNase H domain were filtered out. 
As a result, a set of 26 617 decoys was obtained. For each 
model, fitting to the SAXS data was conducted with 
CRYSOL (47). A discrepancy x, defined as: 



N- 1 



I(Sj) - cI ca l c (Sj) 



o(sj) 



where, N is the number of experimental points, c is a 
scaling factor, I calc (sj) and cr are the calculated intensity 
and experimental error at the momentum transfer sj, re- 
spectively (47), was calculated. In addition, the distance 
between the active site of the RNase H domain and the 
scissile phosphate in substrate (separation of Cot atom of 
Asp534 and phosphorus atom of the nucleotide located 



19 bp from the active site of polymerase domain) was 
measured for each model and plotted against 
X- Analogous analyses were performed for apo form. 
Owing to the absence of the RNA/DNA hybrid in the 
structure, RNase H domain-scissile phosphate distance 
was measured with the respect to the virtual point in 3D 
space corresponding to a phosphorus atom. 



RESULTS AND DISCUSSION 

Structure of XMRV RT in complex with RNA/DNA 
substrate 

To gain insight into the mechanism of a gammaretroviral 
RT, we solved the crystal structure of the XMRV enzyme 
in complex with an RNA/DNA hybrid (Table 1 and 
Figure 1A). Details of the solution, refinement of the 
structure and its overall description can be found in 
Supplementary Information. The structure was solved by 
molecular replacement using the model of apo Mo-MLV 
RT (PDB code: 1RW3) (9), but significant portions of the 
thumb and connection subdomains required retracing 
(Supplementary Information, Supplementary Figures SI 



thumb 



connection 




f RNA 



Figure 1. Overall structure of XMRV RT in complex with RNA/DNA 
hybrid. (A) The protein is shown in cartoon representation with 
subdomains colour-coded pink for palm, cyan for fingers, yellow for 
thumb and green for connection. RNA template and DNA primer are 
coloured red and blue, respectively. (B) Surface representation of 
XMRV RT with electrostatic potential (±15kT/e) coded in blue 
(positive) and red (negative). Nucleic acid is shown in cartoon repre- 
sentation (yellow for RNA and green for DNA). 



3878 Nucleic Acids Research, 2013, Vol. 41, No. 6 



and S2). The DNA polymerase and connection domains 
of XMRV RT could be traced, but although the RNase H 
domain was present in the crystal, we failed to observe its 
electron density and consider it disordered. Analysis of the 
crystal packing interactions shows that there is enough 
space in the crystal to accommodate the RNase H 
domain (Supplementary Figure S4). For the nucleic acid, 
we traced 16 of 25 residues of the RNA and 14 of 22 
residues of the DNA strand (Supplementary Figure S3). 

We also corrected the apo Mo-MLV RT structure by 
changing the tracing in the thumb and connection 
domains to the one observed in our complex structure 
and refining the new model against 1RW3 structure 
factors deposited in the PDB. After those corrections, 
the apo structure resembles the XMRV RT protein from 
our complex structure — individual domains of these two 
RTs can be superimposed with low root mean square de- 
viations (RMSD) for C-ot atoms of 0.7-1.2 A 
(Supplementary Table SI). However, considerable global 
conformational changes occur in the presence of substrate 
and are described in the Supplementary Information and 
Supplementary Figure S5. 

Contacts with RNA/DNA 

The RNA/DNA hybrid in our structure interacts with all 
domains of the protein (Figure 1A) and is comfortably 
accommodated by the substrate-binding channel, which 
is overall positively charged (Figure IB). All but one 
protein-nucleic acid interaction involve the phospho- 
diester backbone, agreeing with the lack of sequence spe- 
cificity (Figure 2A). Overall, the protein covers 14 nt of the 
primer and 16nt of the template strands, in good agree- 
ment with the DNase I footprinting studies of Mo-MLV 
RT lacking the RNase H domain (29). The minor groove 
of the substrate has the width of ~9 A in the vicinity of the 
active site (nucleotides —3 and —4 of the template) and 
~10.5 A around the thumb subdomain (nucleotides —5 
and —6 of the template). These values indicate that both 
strands adopt an A-form conformation. This is in agree- 
ment with the fact that RTs are able to extend tRNA 
primers on RNA template — such dsRNA substrates can 
only adopt a pure A-form. Further, towards the connec- 
tion domain, the minor groove is ~8.5 A wide, indicating 
that the hybrid adopts an intermediate conformation 
between A- and B-forms. 

A comparison of the apo structures of Mo-MLV and 
HIV-1 RT (6,48) showed that residues comprising the 
DNA polymerase active site, and participating in 
binding of catalytic metal ions and the incoming dNTP, 
are highly conserved; thus, the active site architecture of 
the two enzymes is very similar (Supplementary Figure 
S6). Our structure provides further support for this con- 
servation by showing that the trajectory of the substrate 
near the DNA polymerase active site, and positioning of 
the 3'-OH of the DNA primer strand in particular, is 
superimposable. This implies that although the incoming 
dNTP and divalent metal ions are absent from our sub- 
strate complex, their mode of binding almost certainly 
parallels that of HIV-1 RT (11). A detailed comparison 



of the polymerase active sites of XMRV and HIV-1 RTs 
can be found in Supplementary Information. 

The template RNA in our structure contains a 5' 
overhang with terminal nucleotide +2 flipped out 
(Figure 2A and B). XMRV RT residues contributing to 
stabilization of this nucleotide are Tyr64 and Leu99. The 
aromatic ring of Tyr64 forms a stacking interaction with 
the base of the nucleotide, and this interaction is further 
stabilized in our crystals by a lattice contact 
(Supplementary Figure S4C). A characteristic feature of 
RTs is their ability to perform DNA synthesis concurrent 
with the displacement of downstream nucleic acid 
hybridized with the template. A Y64A substitution of 
Mo-MLV RT selectively reduced displacement synthesis 
(49), whereas a recombinant virus containing this 
mutation failed to replicate. KMn0 4 probing showed 
that displacement synthesis involves melting of base 
pairs +1 and +2 ahead of the DNA polymerase active 
site (50). Stacking between the unpaired base and Tyr64 
is likely a key element of this mechanism and is supported 
by observations that the aromatic ring of the amino acid is 
sufficient for displacement synthesis, as a Y64F variant 
exhibits wild-type enzymatic properties (49). The equiva- 
lent of Tyr64 in HIV-1 RT is Trp24, which was experi- 
mentally demonstrated to contribute towards substrate 
binding (51). 

In the XMRV RT co-crystal, RNA nucleotide +2 also 
interacts through its 2'-OH group with fingers subdomain 
residues Asp 114 and Argll6 located opposite the 
active site (Figure 2B). The side chain of Argll6 is 
located ~3.5 A from the phosphate of nucleotide +1 and 
protrudes from the protein surface to form a 'pin' that 
guides template trajectory such that the base of nucleotide 
+ 1 is positioned to pair with the incoming dNTP 
(Figure 2B). To fulfill this function, the pin must be 
rigid, and the conformation of the Argll6 side chain is 
stabilized by a strong ionic interaction with Asp 114. 
Biochemical data confirm the importance of both 
Argll6 and Aspll4. Substituting Argll6 of Mo-MLV 
RT with Lys or Leu reduced DNA polymerase activity 
on homopolymeric RNA/DNA and abolished activity 
on a template with random sequence (14). Similarly, 
substituting Asp 114 with Asn reduced activity on a 
homopolymeric RNA template by 60% and on random 
sequence RNA template ~5-fold. Mutating either residue 
also inhibited virus replication (52,53). Proteins with 
D114A or R116A substitutions are also significantly less 
processive, unable to resolve hairpins in the template and 
displayed reduced affinity for nucleic acid (52). HIV-1 RT 
counterparts of these residues are Asp76 and Arg78. 
When Arg78 was substituted with Lys, HIV-1 RT 
retained 50% of its DNA polymerase activity (54), and a 
later study showed that an R78A substitution increased 
fidelity and decreased affinity for DNA and RNA tem- 
plates (55). 

Template nucleotides +1, —1 and —3 form hydrogen 
bonds through their 2'-OH groups with backbone car- 
bonyls of Glyl91, Lys 193 (both located at the boundary 
of the fingers and palm subdomains) and Pro 130 from the 
palm (Figure 2C). These interactions could underlie a 
preference for RNA as the template strand. Indeed, 
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Figure 2. Substrate binding. (A) Diagram showing interactions between XMRV RT and nucleic acid. Ovals are coloured according to protein 
subdomains of Figure 1. Black outline denotes residues for which equivalents can be found in HIV-1 RT (putative equivalent denoted with dashed 
oval). Stacking of Tyr64 with the RNA overhang is shown as parallel lines. Van der Waals interactions are shown as grey dashed lines and polar 
interactions as blue dashed lines. (B) Interaction of XMRV RT with the terminal portion of the RNA/DNA duplex (shown in red and blue for RNA 
and DNA, respectively). XMRV RT protein is in surface representation. Tyr64 and residues forming the 'pin' stabilizing the conformation of the 
template in front of the incoming nucleotide are shown as cyan sticks, and the dNTP modelled based on the HIV-1 RT structure (PDB ID: 1RTD) is 
shown as dark grey sticks. (C) Interactions with the 2'-OH groups of the template. The RNA strand is shown as red sticks and 2'-OH groups as 
spheres. Hydrogen bonds are indicated with dashed lines. (D) Binding of primer nucleotides by residues of the thumb subdomain. In panels (B) and 
(D), a composite simulated annealing omit map contoured at 1 a is overlaid on the substrate (blue mesh). 



HIV-1 RT binds RNA/DNA substrates ~ 10-fold tighter 
than dsDNA (56). Moreover, the 5 terminal bp of RNA/ 
DNA located at the polymerase active site were sufficient 
for enhanced binding (56). The HIV-1 counterparts of 
Glyl91 and Lysl93 are Glyl52 and Lysl54, respectively, 
and the carbonyl of Lysl54 forms an interaction with the 



RNA 2'-OH in the HIV RT-RNA/DNA complex (PDB 
ID: 1HYS) (12). Prol30 has no clear equivalent in HIV-1 
RT— perhaps its role is fulfilled by Gln91. In XMRV RT, 
Glyl91 and Lysl93 are located in a loop whose conform- 
ation is stabilized by a hydrogen bond between the side 
chain of Asnll9 and the backbone amide of Lysl93. 
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This stabilization likely enhances substrate binding. 
Interestingly, an N119A substitution in Mo-MLV RT 
reduced DNA polymerase activity 50% on a hom- 
opolymeric RNA template and completely abolished 
activity on random-sequence RNA (14). Further towards 
3' end of the RNA template, the phosphate groups of 
nucleotides —5 to —8 interact with a positively charged 
patch on the surface of the connection domain formed 
by the side chains of Lys397, Lys398 and Lys425, which 
have no obvious equivalents in HIV-1 RT. 

For interactions with the primer strand, the phosphate 
group of the penultimate DNA nucleotide is strongly 
bound by the guanidinium group of Arg284 located in 
the first helix of the thumb subdomain (Figure 2D). No 
clear equivalent of this residue exists in HIV-1 RT. 
Nucleotides —3, —4 and —5 form van der Waals inter- 
actions with the side chains and backbone of oc-helix F 
of the thumb, which is inserted into the minor groove of 
the substrate. We note a minor deformation of the primer 
backbone in this region, although we cannot exclude the 
possibility that it is induced by a displacement of the 
thumb domain resulting from crystal packing interactions 
(Supplementary Figure S4B). A prominent interaction 
involving a-helix F is stacking of the ribose ring of 
DNA nucleotide -3 with Phe309. If a 2'-OH were 
present in the ribose ring of the nucleotide, it would 
impose less effective stacking, and therefore this inter- 
action can select against ribonucleotides in the primer 
strand. The HIV-1 RT counterpart is Trp266, whose sub- 
stitution completely abolishes DNA polymerase activity 
(57). RTs are known to extend RNA primers poorly, 
other than those of (— ) and (+) strand synthesis (tRNA 
and the PPT, respectively) (3). Perhaps this specificity is 
partially conferred by Phe309 in XMRV RT and Trp266 
in HIV-1 RT. 

A notable interaction is made by the phosphate group of 
nucleotide -5 of the DNA strand with Arg298 and Arg301 
of a-helix F of the thumb domain, equivalent to HIV-1 
residues Gln258 and Asn255, respectively (Figure 2D). 
The side chains of the two residues form 'tweezers' that, 
together with Glu302, hold the backbone of the DNA 
strand. The following fragment of the DNA strand does 
not interact with the protein, but nucleotide —12 forms an 
interaction with Arg456 and nucleotide —13 a hydrogen 
bond with Trp406. Both residues are located in the con- 
nection subdomain and lack obvious equivalents in HIV- 1 
RT. Interestingly, when a PPT substrate is modelled into 
the structure of XMRV RT (see later in the text), the 
isolated substrate contact mediated by Trp406 and 
Arg456 is at the boundary between the A- and G-tracts, 
which has been shown to be an element critical for PPT 
recognition (58). It is tempting to speculate that this 
contact may be more efficient for PPT owing to the 
special structure of the A-tract, which leads to better pos- 
itioning of the substrate for RNase H cleavage and a 
kinetic preference for hydrolysis at the PPT-U3 junction. 

In summary, the protein-nucleic acid interactions can 
be divided in several segments. For template binding, 
these are the interactions with RNA overhang (mediated 
by Tyr64 and the 'pin 1 ), followed by a region of 
interactions with 2'-OH groups and subsequently by a 



positively charged patch binding the backbone of the 
RNA. For the primer, most of the interactions are 
mediated by the thumb followed by an isolated interaction 
with the connection domain. In the vicinity of the active 
site, protein-nucleic acid contacts are conserved between 
HIV-1 and XMRV RT. However, further towards the 
connection domain, substrate binding is mediated by a 
different set of residues. 

Site-directed mutagenesis of substrate contacts 

Site-directed mutagenesis was used to assess the import- 
ance of novel nucleic acid contacts that lacked equivalents 
in HIV-1 RT. We prepared XMRV RT variants K397A/ 
K398A, R311A/K425A and W406A/R456A, with substi- 
tutions in the thumb and connection domains, and tested 
their ability to support DNA strand transfer, the activity 
which simultaneously monitors DNA polymerase and 
RNase H activities of the protein. This assay comprised 
a Cy5 5'-labelled donor RNA template annealed to a Cy3 
5'-labelled DNA primer in the presence of a nucleotide 
acceptor RNA template. Donor and acceptor RNA tem- 
plates shared 20 homologous nucleotides at their 5'- and 
3'-termini, respectively. Initial RNA-dependent DNA syn- 
thesis produces a 40 nt strand transfer intermediate (STI) 
and RNase H-mediated strand transfer and subsequent 
DNA synthesis lead to a 60 nt product (STP) (Figure 3A). 

Data in Figure 3B show that the 40 nt STI is efficiently 
synthesized by all proteins, indicating that contacts 
between the connection domain and the substrate do not 
play a major role in DNA polymerase activity. They do, 
however, affect RNase H function — in the case of variants 
K397A/K398A and W406A/R456A, we observe reduced 
RNase H activity as evidenced by a slower decrease in the 
T 40 RNA template. In addition, W406A/R456A RT dis- 
played altered cleavage specificity, generating 21 and 20 nt 
fragments without any shorter products. The R21/20 
cleavage corresponds to the positioning of the substrate 
in which the blunt end of the RNA/DNA STI is stably 
bound at the polymerase active site. For the other 
cleavage sites (R18/17 and in particular R13/14), the end of 
the hybrid shifts from the polymerase domain and sub- 
strate binding involves only interactions with the thumb 
and connections domains. One of these important 
contacts is lost in the W406A/R456A variant 
(Figure 3C), which likely explains why the cleavages at 
R21/20 involving interactions between the hybrid and the 
polymerase domain are preferred. Unexpectedly, R311A/ 
K425A variant displayed enhanced RNase H activity and 
consequently higher rate of strand transfer. It is not clear 
what causes this increase in the activity, but overall our 
data show that the connection domain participates in cor- 
rectly positioning the substrate for RNase H cleavage. 

Comparison with structures of HIV-1 RT 

We compared the structures of XMRV RT and HIV-1 RT 
containing (i) an RNA/DNA (PDB ID: 1HYS) (12) and 
(ii) dsDNA and the incoming nucleotide (PDB ID: 1RTD) 
(11). Figure 4A shows a structure-based alignment for the 
two proteins. Individual subdomains of XMRV and 
HIV-1 RT are similar, and the palm, fingers, thumb and 
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Figure 3. Biochemical characterization of the XMRV RT variants. The DNA strand transfer assay is outlined schematically in (A) and described in the 
main text. The results are shown in (B). Notations P 2 o and T 40 indicate migration positions of the Cy3-labelled, 20 nt DNA primer and Cy5-labelled RNA 
template, respectively, before initiation of DNA synthesis. Full-length primer extension on the donor RNA template is evidenced by accumulation of the 
strand transfer intermediate, STI 40 , whereas RNase H hydrolysis products are defined by R21/20. Ris/n an d Ri4/i3- Transfer of nascent DNA to the 
acceptor RNA template and continued DNA synthesis is evidenced by accumulation of the 60 nucleotide strand transfer product, STP 60 . Samples were 
withdrawn after 5min (Lanes a), lOmin (Lanes b), 20min (Lanes c) and 40min (lanes d) for analysis. Positions of Ala substitutions XMRV RT are 
indicated below each panel and shown in (C) with lime green for K397A/K398A, blue for R31 1A/K425A and purple for W406A/R456A. 



connection can be superimposed with RMSDs between 
1.2 and 1.7 A (Table 2). However, we noted that a-helix 
E (residues 282-290) of the XMRV RT thumb is replaced 
by an extended fragment in HIV-1 RT (Figure 4A). 
Consensus secondary structure predictions, calculated 
using Genesilico metaserver (59), suggest that an equiva- 
lent a-helix exists in RTs from gamma- and 
spumaretroviruses, whereas for lenti-, alpha-, beta- and 
deltaretroviral RTs, this fragment is predicted to be 
extended (Supplementary Figure S7). Therefore, the 
presence of this a-helix defines two classes of RTs. In 
XMRV, this a-helix harbours Arg284 which, as described 
earlier in the text, forms an important contact with the 
primer strand. 

A more significant difference is observed in the connec- 
tion subdomain. In XMRV RT, we traced a-helix L 
between residues 456 and 465. This helix contains two 
methionines, and we could verify our tracing by anomal- 
ous difference maps for the selenomethionine data set 
(Supplementary Figure S2C). The corresponding region 
in HIV-1 RT forms an extended structure located at the 
p66-p51 interface, where an equivalent helix cannot be 
accommodated. Moreover, in the p51 subunit, owing to 



its altered conformation, the region corresponding to a- 
helix L of XMRV RT is tightly packed between the con- 
nection, palm and fingers, preventing accommodation of 
an a-helix. Therefore, converting the a-helical region to an 
extended structure likely reflects adaptation to the dimeric 
architecture of lenti viral RTs. 

The overall structures of proteins in the substrate 
complexes of XMRV and HIV-1 RTs are similar. The 
palm, thumb and connection subdomains of XMRV RT 
(198 C-a atoms) can be superimposed on structures of 
HIV-1 RT with an RMSD of 2.0 and 1.9 A for 1HYS 
and 1RTD, respectively. One key difference is positioning 
of their fingers subdomains. These superimpose well 
between XMRV RT - substrate complex and HIV-1 
ternary complex (PDB ID: 1RTD), adopting a 
'half-open' conformation (11), whereas the conformation 
in the HIV-1 RT - RNA/DNA complex (PDB ID: 1HYS) 
is more open (Figure 4B and D). Therefore, our structure 
may more closely resemble an elongating complex in the 
presence of the incoming dNTP. 

We next compared nucleic acid positioning and trajec- 
tory in our structure versus RNA/DNA or dsDNA from 
HIV-1 RT structures. This differs between the two RTs, 
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HIV PISPIETVPVKLKPGMDGPKVKQgPLTEEKIKALVEICTEMEKEGKISKIGPENPYN 57 



XMRV TLNIEDEYRLHETSKEPDVPLGSTWLSDFPQAWAETGGMGLAVRQAPLIIPLKATSTPVSIKQfPMSQEAKLGIKPHIQRLLDQGILVPCQ — SPWN 95 

template overhang A 1 

hiv tpvfaikkkdstk wrklvIfIelnkrt qdfwevqlgip-hpaglkk 155 

XMRV TPLLPVKKPGTNDYRPVQ|l^VNKRVEDIHPTVpjPYlttLSGLPP 194 
T^^^^^* ^ ^^^^^^^ 4 T 4 

2 3 p/n B 2 'OH 4 sfer/'c gate 5 2 'OH 



HIV SPAIFQSSMTKILEPFKKQNPDIVIYQYMDDLYVGSDLEIGQHRTKIEELRQHLLRMGLTTPDKKHQKEP-PFLI-Jl-IGYELHPDKWTVQPI VLPEKD 250 

XMRV SPTLFDEALHRDIADFRIQHPDLILLQYVDDIiliAAT-SEQDCQRGTRALLQTLGinLGYRASAKKAQICQKQVKYLGYLLKEGQRWLTEARKETVMGQPT 293 

C 67 D 89 10E 



HIV SWTVNDIQKLVGKLNWASQIYPGIK — VRQLSKLLRGTKALTEVIPLTEEAELELAENREILKEPVHGVYYDPSKDLIAEIQKQGQGQWTYQIYQEPF — 346 

XMRV PKTPRQLREFLGTAGFCRLWIPGFAEMAAPLYPLTKTG TLFNWGPDQQKAYQEIKQALLTAPALGLPDLTKPFELFVDEKQ-GYAKGVLTQKLGPW 388 

'^■■■■■■■■■i^B* ^^^^^^ ^^^^^^^^ 

H 11 12 



^^^^ C^^^^^^^^^^^B 
HIV -KiniCTGKYARl^GAH-TlTOVKQLTEAVQKITTESIVIWGKT-PKFK^ 442 
XMRV R^PVAYLS^piDPVAAGWPPCLRMVAAIAVLTKDAG^LTMGQPLVILAPHAVEALVKQPPDRRLSNARMTHYQAMLLDTDRVQFGPVVAIiNPATLLPLP... 487 

13 I J 14 K L 15 




RNase H 




Figure 4. Comparison of gammaretroviral and lentiviral RT structures. (A) Structure-based alignment of the sequences of HIV-1 and XMRV RT. 
Residues involved in forming the active site and in binding of the incoming dNTP are highlighted in yellow, whereas those involved in binding 
template and primer are highlighted in red and blue, respectively. Critical functional residues are indicated and the line between sequences colour 
codes the subdomains is as in Figure 1A. cz-helices are indicated as tubes and P-strands as arrows and labelled. (B and D) Superposition of the palm/ 
fingers subdomain from XMRV RT complex structure, HIV-1 RT in complex with RNA/DNA (PDB ID: 1HYS) (B) and HIV-1 RT complexed with 
dsDNA (PDB: 1RTD) (D). XMRV RT palm subdomain is in pink and fingers in cyan. HIV-1 RT structure is shown in orange. For clarity, only the 
substrate from our XMRV RT structure is presented. (C and E) palm subdomain-based superposition of XMRV RT complex structure with 
structures of HIV-1 RT complexed with RNA/DNA (PDB ID: 1HYS) (E) and dsDNA (PDB ID: 1RTD) (C). HIV-1 RT is shown in orange for 
p66 subunit, and the RNase H is shown in darker colour. The p51 subunit and XMRV RT are omitted for clarity. The RNA/DNA hybrid from 
XMRV RT is shown in red (RNA) and blue (DNA) and the substrates from HIV RT structures in pink (RNA) and cyan (DNA). The axes of the 
nucleic acid are shown as spheres (dark blue for XMRV RT complex and cyan for HIV-1 complexes). 
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Table 2. Superpositions of the structures of substrate-bound XMRV and HIV-1 RTs 

Superimposed domain (Total number of traced Coc atoms in each RMSD (A) calculated for: 

XMRV RT domain given in parentheses) 

Substrate Fingers Palm Thumb Connection 



RNA/DNA complex, 1HYS 



Substrate 30 pairs of atoms 3 


0.5 


5.S 


2.3 


3.1 


4.1 


Fingers 84 pairs of Coc atoms (of 106) 


7.3 


1.7 


8.2 


14.9 


17.1 


Palm 91 pairs of Coc atoms (of 136) 


1.1 


4.6 


1.3 


3.1 


3.1 


Thumb 53 pairs of Coc atoms (of 79) 


2.9 


6.1 


5.3 


1.6 


2.3 


Connection 54 pairs of Coc atoms (of 120) 


4.7 


8.1 


5.7 


3.1 


1.6 


dsDNA complex, 1RTD 












Substrate 24 pairs of atoms" 


0.4 


2.4 


1.8 


3.0 


3.4 


Fingers 84 pairs of Coc atoms (of 106) 


1.6 


1.2 


3.6 


3.7 


6.2 


Palm 91 pairs of Coc atoms (of 136) 


0.7 


2.7 


1.3 


3.0 


3.2 


Thumb 53 pairs of Coc atoms (of 79) 


2.7 


3.5 


4.4 


1.5 


2.8 


Connection 54 pairs of Coc atoms (of 120) 


5.2 


7.3 


5.8 


3.1 


1.6 



The individual subdomains were superimposed [the resulting root mean square deviations (RMSDs) of pairs of C— oc atoms are shown in bold] and 
the RMSD values for the other subdomains are given. 

''Phophodiester backbone atoms of nucleotides —1, —2 and —3 of the primer and template strands were used for superposition. 



regardless whether the palm subdomain or the terminal 
region of the substrate is used for superposition 
(Figure 4C and E). The HIV-1 and XMRV RT substrates 
are superimposable for the first 3 bp going from the 
polymerase active site, but after this region, their 
trajectories differ. In our XMRV RT structure, the sub- 
strate passes closer to the connection subdomain, which 
would not be possible for HIV-1 RT, as it would invoke 
clashes with both the connection and RNase H domain. 
The trajectory of XMRV RT substrate results in forma- 
tion of an isolated contact between the substrate and the 
connection domain mediated by Trp406 and Arg456 
which, as described earlier in the text, may play a role in 
PPT recognition. 

Model of full-length XMRV RT and its verification 
through SAXS experiments 

As the RNase H domain is not visible in our structure, we 
prepared models of the full-length XMRV RT complex by 
combining our structure with those of XMRV RNase H 
(23) and human RNase HI in complex with RNA/DNA 
(19) (described in more detail in Supplementary 
Information). Models with polymerase-RNase H 
distance of 18 or 19 bp are free of steric clashes, whereas 
any distance shorter than 18 bp imposes a severe clash 
between the RNase H and connection domains. This is 
in agreement with biochemical data, which show that 
very little 3'-end directed cleavage by Mo-MLV occurs 
at distances 17 bp or closer to the polymerase active site 
(60). The model with 19 bp separation between the poly- 
merase and RNase H active sites is shown in Figure 5A. 

To verify the correctness of the model and to provide 
further insights into the structure of the full-length 
enzyme, XMRV RT was examined in the absence and 
presence of RNA/DNA hybrids by SAXS. Complexes 
with several hybrids were examined, including the duplex 
used for crystallization (hybrid 1) and hybrids with 
strands of the same length but a sequence corresponding 
to the Mo-MLV PPT, which differs from the XMRV PPT 
by one nucleotide. Substrates with preferred RNase H 
cleavage sites located 18 nt (PPT- 18) and 19 nt (PPT- 19) 



from the 3'-end of the recessed DNA were used. We first 
calculated the radius of gyration (R g ), maximum particle 
dimension (D max ) and particle volume values based on 
SAXS data (Table 3). For all three parameters, the 
values were larger for protein alone and the complex 
with hybrid 1 than for complexes with hybrids PPT- 18 
and PPT- 19, implying that complexes with PPT hybrids 
induced a more compact structure. This may reflect 
multiple positions of the RNase H domain for the 
protein alone and hybrid 1 complex and a more ordered 
RNase H domain interacting with the PPT substrates. 
Therefore, for further analysis of the model of full-length 
protein in complex with the substrate described earlier in 
the text, we used SAXS data for PPT- 19 complex. These 
data show very good agreement with the theoretical scat- 
tering curve calculated based on the model with x value of 
0.92 (Figure 5B). We also calculated 30 independent 
ab initio reconstructions for PPT- 18 and PPT- 19 data. 
PPT- 19 averaged filtered reconstruction showed very 
good agreement with our model of the full-length 
protein (/ = 0.89) (Figure 5A). 

We next explored further the potential mobility of the 
RNase H domain. We first used Normal Mode Analysis 
scored against SAXS data (Supplementary Information), 
which indicated mobility of RNase H domain in apo 
protein and confirmed the correctness of the full-length 
XMRV RT substrate complex model. We next applied 
EOM (45) using models with random RNase H domain 
positions based on coarse-grained representation of the 
linker between the connection and RNase H domain. 
The theoretical scattering curve calculated based on an 
ensemble selected in EOM showed very good agreement 
with the experimental SAXS data for apo protein 
(Figure 5C), with x of 0.91. We noted a bimodal distribu- 
tion of models selected from EOM with a larger fraction 
with compact conformation (with R g between 35 and 40 A) 
and a minor fraction of the extended conformations (with 
R g between 43 and 48 A) (Figure 5D and E). From this 
result, we conclude that the RNase H domain is mobile, 
but its positioning is not completely random with two 
preferred regions, between which it can easily alternate. 
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Figure 5. Full-length XMRV RT model and SAXS data. (A) Model of the full-length protein based on the XMRV RT substrate complex structure 
of Figure 1A, and structures of human RNase HI and the isolated XMRV RNase H domain. The modelled RNase H domain is depicted in orange. 
Disordered regions are in grey (only a single modelled conformation is shown for clarity). The 19 bp distance between the DNA polymerase and 
RNase H active sites was assumed for model preparation. An ab initio calculated SAXS density map for XMRV RT - PPT-19 complex is overlaid on 
the model (grey). (B and C) Experimental (black circles and grey error bars) and theoretical (red line) scattering curves for XMRV RT - PPT-19 
complex (B) and apo protein (C), computed by CRYSOL and EOM, respectively. The plots display the logarithm of the scattering intensity as a 
function of momentum transfer s = 4jt sin 0/A (where 20 is the scattering angle and X - 0.15nm is the X-ray wavelength used in the measurements). 
(D) EOM analysis of the SAXS data for apo protein. The frequencies of models with particular R g values are plotted for the pool used in EOM (red) 
and selected to fit the SAXS data for apo XMRV RT (blue). (E) Two examples of models selected in EOM analysis, superimposed based on the 
polymerase-connection domains, show different positioning of the RNase H domain (orange). The coarse-grained models of the linker and terminal 
regions are shown as small spheres for one model only. 



To better probe potential interactions between the 
RNase H domain and its substrate, we prepared another 
ensemble with shorter distances between the RNase H 
domain and the hybrid and with full-atom representation 
of the N terminus and linker region. The ensemble con- 
tained 29 733 models for apo form and 26 617 models for 
the complex (protein models with clashes with RNA/DNA 
hybrid were removed from the complex ensemble). Models 



were subsequently scored for their agreement with SAXS 
data for both the apo protein and XMRV RT - PPT-19 
complex. For each model, the score was plotted versus the 
distance between the active site of the RNase H domain 
(C-a position of Asp534) and the position of phosphorus 
atom of the scissile phosphate of the PPT-19 substrate. 
SAXS data for the apo protein (Supplementary Figure 
S8A) show minimal correlation of the score of the 
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Table 3. Parameters derived from SAXS experiments 





R g (nm) 


D max (nm) 


V P (nm 3 ) 


apo 


3.95 ± 0.10 


13.5 ± 0.5 


160 ± 10 


hybrid 1 


3.80 ± 0.10 


13.0 ± 0.5 


172 ± 10 


PPT- 18 


3.60 ± 0.10 


12.0 ± 0.5 


155 ± 10 


PPT- 19 


3.60 ± 0.10 


12.0 ± 0.5 


155 ± 10 



R g , radius of gyration; D max , maximum size of the particle; V p , 
excluded volume of the hydrated particle estimated from Porod 
asymptotics. 



model with the distance between the position of the sub- 
strate and the RNase H domain, demonstrating that dif- 
ferent positions of RNase H domain fit equally well to the 
SAXS data. In contrast, for RT - PPT- 19 complex data, 
models with shorter distance between the active site of the 
RNase H domain and the scissile phosphate show better x 
scores than models with this domain located further from 
the hybrid (Supplementary Figure S8B), indicating that 
for the PPT- 19 complex, the RNase H domain interacts 
with the substrate. 

In conclusion, the Normal Mode Analysis, EOM and 
the x versus distance plot analysis indicate mobility of the 
RNase H domain in the absence of the substrate and its 
ordering on the PPT hybrids. No such ordering is 
observed for hybrid 1, for which the SAXS data are 
similar to the apo protein, probably reflecting higher 
affinity of the RNase H domain for the PPT sequence 
over the random sequence in hybrid 1. Furthermore, the 
SAXS data support our model of full-length XMRV RT 
interacting with the substrate. 



CONCLUSIONS 

Our studies of XMRV RT provide the first comprehensive 
structural analysis of the interaction of a monomeric 
gammaretroviral enzyme with an RNA/DNA substrate. 
We show here that interactions between the DNA poly- 
merase domains and the substrate, as well as the active site 
composition, are highly conserved among monomeric and 
dimeric RTs. However, a pronounced difference is the 
positioning and mobility of their RNase H domains. In 
HIV-1 RT, this domain is relatively rigidly positioned by 
the p66 connection subdomain and p51 DNA polymerase 
domain. In contrast, as demonstrated by our SAXS data, 
the XMRV RNase H domain is mobile in the absence of 
the substrate. RNase H activity is responsible for the 
mechanistically more intricate steps of reverse transcrip- 
tion, such as DNA strand transfer or generation and 
specific removal of the PPT primer. Given the structural 
differences between monomeric and dimeric RTs, it is 
interesting that some properties of the RNase H activity 
of the two are conserved. For example, Mo-MLV RT can 
use the HIV-1 PPT sequence for priming (+) strand syn- 
thesis (61), and HIV-1 RT can use Mo-MLV PPT with 
only slightly affected cleavage specificity (62). Further 
structural studies should elucidate the atomic details of 
RNase H function in the context of these RTs. 



SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online: 
Supplementary Materials and Methods, Supplementary 
Results, Supplementary Table 1, Supplementary Figures 
1-8 and Supplementary References [63-81]. 
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